NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

COSTELLO: Contrastive Testing for Embedding-Based Large Language Model as a Service Embeddings

https://doi.org/10.1145/3643767

Jiang, Weipeng; Zhai, Juan; Ma, Shiqing; Zhang, Xiaoyu; Shen, Chao (July 2024, Proceedings of the ACM on Software Engineering)

Large language models have gained significant popularity and are often provided as a service (i.e., LLMaaS). Companies like OpenAI and Google provide online APIs of LLMs to allow downstream users to create innovative applications. Despite its popularity, LLM safety and quality assurance is a well-recognized concern in the real world, requiring extra efforts for testing these LLMs. Unfortunately, while end-to-end services like ChatGPT have garnered rising attention in terms of testing, the LLMaaS embeddings have comparatively received less scrutiny. We state the importance of testing and uncovering problematic individual embeddings without considering downstream applications. The abstraction and non-interpretability of embedded vectors, combined with the black-box inaccessibility of LLMaaS, make testing a challenging puzzle. This paper proposes COSTELLO, a black-box approach to reveal potential defects in abstract embedding vectors from LLMaaS bycontrastive testing. Our intuition is that high-quality LLMs can adequately capture the semantic relationships of the input texts and properly represent their relationships in the high-dimensional space. For the given interface of LLMaaS and seed inputs, COSTELLO can automatically generate test suites and output words with potential problematic embeddings. The idea is to synthesize contrastive samples with guidance, including positive and negative samples, by mutating seed inputs. Our synthesis guide will leverage task-specific properties to control the mutation procedure and generate samples with known partial relationships in the high-dimensional space. Thus, we can compare the expected relationship (oracle) and embedding distance (output of LLMs) to locate potential buggy cases. We evaluate COSTELLO on 42 open-source (encoder-based) language models and two real-world commercial LLMaaS. Experimental results show that COSTELLO can effectively detect semantic violations, where more than 62% of violations on average result in erroneous behaviors (e.g., unfairness) of downstream applications.
more » « less
Full Text Available
Merlin: Multi-tier Optimization of eBPF Code for Performance and Compactness

https://doi.org/10.1145/3620666.3651387

Mao, Jinsong; Ding, Hailun; Zhai, Juan; Ma, Shiqing (April 2024, ACM)

Full Text Available
Towards General Robustness Verification of MaxPool-Based Convolutional Neural Networks via Tightening Linear Approximation

https://doi.org/10.1109/CVPR52733.2024.02339

Xiao, Yuan; Ma, Shiqing; Zhai, Juan; Fang, Chunrong; Jia, Jinyuan; Chen, Zhenyu (June 2024, IEEE)

Full Text Available
Correlations between Deep Neural Network Model Coverage Criteria and Model Quality

https://doi.org/10.1145/3368089.3409671

Yan, Shenao; Tao, Guanhong; Liu, Xuwei; Zhai, Juan; Ma, Shiqing; Xu, Lei; Zhang, Xiangyu (October 2020, Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’20),)
null (Ed.)
Full Text Available
C2S: translating natural language comments to formal program specifications

https://doi.org/10.1145/3368089.3409716

Zhai, Juan; Shi, Yu; Pan, Minxue; Zhou, Guian; Liu, Yongxiang; Fang, Chunrong; Ma, Shiqing; Tan, Lin; Zhang, Xiangyu (November 2020, Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering)
null (Ed.)
Full Text Available
CPC: automatically classifying and propagating natural language comments via program analysis

https://doi.org/10.1145/3377811.3380427

Zhai, Juan; Xu, Xiangzhe; Shi, Yu; Tao, Guanhong; Pan, Minxue; Ma, Shiqing; Xu, Lei; Zhang, Weifeng; Tan, Lin; Zhang, Xiangyu (June 2020, Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering)
null (Ed.)
Full Text Available
LAMP: data provenance for graph based machine learning algorithms through derivative computation

https://doi.org/10.1145/3106237.3106291

Ma, Shiqing; Aafer, Yousra; Xu, Zhaogui; Lee, Wen-Chuan; Zhai, Juan; Liu, Yingqi; Zhang, Xiangyu (January 2017, Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering)

Full Text Available

Search for: All records